Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision
نویسندگان
چکیده
We consider the post-training quantization problem, which discretizes weights of pre-trained deep neural networks without re-training model. propose multipoint quantization, a method that approximates full-precision weight vector using linear combination multiple vectors low-bit numbers; this is in contrast to typical methods approximate each single low precision number. Computationally, we construct with an efficient greedy selection procedure, and adaptively decides number points on quantized based error its output. This allows us achieve higher levels for important greatly influence outputs, yielding ``effect mixed precision'' but physical implementations (which requires specialized hardware accelerators). Empirically, our can be implemented by common operands, bringing almost no memory computation overhead. show outperforms range state-of-the-art ImageNet classification it generalized more challenging tasks like PASCAL VOC object detection.
منابع مشابه
Mixed Precision Training
Increasing the size of a neural network typically improves accuracy but also increases the memory and compute requirements for training the model. We introduce methodology for training deep neural networks using half-precision floating point numbers, without losing model accuracy or having to modify hyperparameters. This nearly halves memory requirements and, on recent GPUs, speeds up arithmeti...
متن کاملMixed-Precision Memcomputing
As the CMOS scaling laws break down because of technological limits, a radical departure from the processor-memory dichotomy is needed to circumvent the limitations of today’s computers. In-memory computing is a promising concept in which the physical attributes and state dynamics of nanoscale resistive memory devices organized in a computational memory unit are exploited to perform computation...
متن کاملSound Mixed-Precision Optimization with Rewriting
Finite-precision arithmetic computations face an inherent tradeo between accuracy and e ciency. The points in this tradeo space are determined, among other factors, by di erent data types but also evaluation orders. To put it simply, the shorter a precision’s bit-length, the larger the roundo error will be, but the faster the program will run. Similarly, the fewer arithmetic operations the prog...
متن کاملAccelerating Scientific Computations with Mixed Precision Algorithms
a Department of Mathematics, University of Coimbra, Coimbra, Portugal b French National Institute for Research in Computer Science and Control, Lyon, France c Department of Electrical Engineering and Computer Science, University Tennessee, Knoxville, TN, USA d Oak Ridge National Laboratory, Oak Ridge, TN, USA e University of Manchester, Manchester, United Kingdom f Department of Mathematical an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i10.17054